pacman::p_load(rgdal, spdep, tmap, sf,
ggpubr, corrplot, tidyverse, funModeling, blorr, GWmodel, skimr, caret, report)In-class_Ex5 Modeling the Spatial Variation and the Explanatory factors of Water Point Status using Geographic Weighted Logistic Regression
Setting the scene
To build an explanatory model to discover factor affecting water point status in Osun state, Nigeria
Study area: Osun State, Nigeria
Data sets:
Osun.rds, contains LGAs boundaries of Osun State. It is in sf polygon dataframe, and
Osun_wp_sf.rds, contained water points within Osun State. It is in sf point dataframe.
Model Variables
Dependent variable: water point status (ie. functional/non-functional)
Independent variables:
distance_primary_road
distance_to_secondary_road
distance_to_tertiary_road
distance_to_city
distance_to_town
water_point_population
local_population_1km
usage_capacity,
is_urban,
water_source_clean
Getting start
create in-class Exercise 5 folder
Write a code chunk to load the following R package: sf, tidyverse, funModeling, blorr, corrplot, ggpubr, sf, spdep, GWmodel, tmap, skimr, caret
sf for importing and processing geospatial data,
tidyverse for importing and processing non-spatial data. In this exercise, readr package will be used for importing wkt data and dplyr package will be used to wrangling the data.
Spatial data handling
- sf and spdep
Attribute data handling
- tidyverse
Choropleth mapping
- tmap
Multivariate data visualisation and analysis
- coorplot, ggpubr,
Classification And REgression Training
- caret
Automated Reporting
- report
Binary Logistic Regression Models
- blorr
Geographically-Weighted Models
- GWmodel
summary statistics about variables in data frames
- skimr
Data Import
The code chunk below load the datasets into R.
Osun <- read_rds("rds/Osun.rds")
Osun_wp_sf <- read_rds("rds/Osun_wp_sf.rds")Thanks to Prof, the Osun and Osun_wp_sf datasets have been pre-cleaned and wrangled. In the code chunk below, freq() of funModeling package is used to display the distribution of status field in Osun_wp_sf.
Osun_wp_sf %>%
freq(input = "status")Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
ℹ The deprecated feature was likely used in the funModeling package.
Please report the issue at <https://github.com/pablo14/funModeling/issues>.

status frequency percentage cumulative_perc
1 TRUE 2642 55.5 55.5
2 FALSE 2118 44.5 100.0
From the plot we can tell Osun consists of about 55.5% of functional water points, 44.5% of non-functional water points. Unknown water points has been excluded in the dataset.
we plot the distribution of the water points by status using tmap, as shown in the code chunk below.
tmap_mode("view")tmap mode set to interactive viewing
tm_shape(Osun)+
tm_polygons(alpha = 0.4) +
tm_shape(Osun_wp_sf) +
tm_dots(col = "status",
alpha = 0.6) +
tm_view(set.zoom.limits = c(9,12))Exploratory Data Analysis
Summary Statistics with skimr
Osun_wp_sf %>%
skim()Warning: Couldn't find skimmers for class: sfc_POINT, sfc; No user-defined `sfl`
provided. Falling back to `character`.
| Name | Piped data |
| Number of rows | 4760 |
| Number of columns | 75 |
| _______________________ | |
| Column type frequency: | |
| character | 47 |
| logical | 5 |
| numeric | 23 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| source | 0 | 1.00 | 5 | 44 | 0 | 2 | 0 |
| report_date | 0 | 1.00 | 22 | 22 | 0 | 42 | 0 |
| status_id | 0 | 1.00 | 2 | 7 | 0 | 3 | 0 |
| water_source_clean | 0 | 1.00 | 8 | 22 | 0 | 3 | 0 |
| water_source_category | 0 | 1.00 | 4 | 6 | 0 | 2 | 0 |
| water_tech_clean | 24 | 0.99 | 9 | 23 | 0 | 3 | 0 |
| water_tech_category | 24 | 0.99 | 9 | 15 | 0 | 2 | 0 |
| facility_type | 0 | 1.00 | 8 | 8 | 0 | 1 | 0 |
| clean_country_name | 0 | 1.00 | 7 | 7 | 0 | 1 | 0 |
| clean_adm1 | 0 | 1.00 | 3 | 5 | 0 | 5 | 0 |
| clean_adm2 | 0 | 1.00 | 3 | 14 | 0 | 35 | 0 |
| clean_adm3 | 4760 | 0.00 | NA | NA | 0 | 0 | 0 |
| clean_adm4 | 4760 | 0.00 | NA | NA | 0 | 0 | 0 |
| installer | 4760 | 0.00 | NA | NA | 0 | 0 | 0 |
| management_clean | 1573 | 0.67 | 5 | 37 | 0 | 7 | 0 |
| status_clean | 0 | 1.00 | 9 | 32 | 0 | 7 | 0 |
| pay | 0 | 1.00 | 2 | 39 | 0 | 7 | 0 |
| fecal_coliform_presence | 4760 | 0.00 | NA | NA | 0 | 0 | 0 |
| subjective_quality | 0 | 1.00 | 18 | 20 | 0 | 4 | 0 |
| activity_id | 4757 | 0.00 | 36 | 36 | 0 | 3 | 0 |
| scheme_id | 4760 | 0.00 | NA | NA | 0 | 0 | 0 |
| wpdx_id | 0 | 1.00 | 12 | 12 | 0 | 4760 | 0 |
| notes | 0 | 1.00 | 2 | 96 | 0 | 3502 | 0 |
| orig_lnk | 4757 | 0.00 | 84 | 84 | 0 | 1 | 0 |
| photo_lnk | 41 | 0.99 | 84 | 84 | 0 | 4719 | 0 |
| country_id | 0 | 1.00 | 2 | 2 | 0 | 1 | 0 |
| data_lnk | 0 | 1.00 | 79 | 96 | 0 | 2 | 0 |
| water_point_history | 0 | 1.00 | 142 | 834 | 0 | 4750 | 0 |
| clean_country_id | 0 | 1.00 | 3 | 3 | 0 | 1 | 0 |
| country_name | 0 | 1.00 | 7 | 7 | 0 | 1 | 0 |
| water_source | 0 | 1.00 | 8 | 30 | 0 | 4 | 0 |
| water_tech | 0 | 1.00 | 5 | 37 | 0 | 20 | 0 |
| adm2 | 0 | 1.00 | 3 | 14 | 0 | 33 | 0 |
| adm3 | 4760 | 0.00 | NA | NA | 0 | 0 | 0 |
| management | 1573 | 0.67 | 5 | 47 | 0 | 7 | 0 |
| adm1 | 0 | 1.00 | 4 | 5 | 0 | 4 | 0 |
| New Georeferenced Column | 0 | 1.00 | 16 | 35 | 0 | 4760 | 0 |
| lat_lon_deg | 0 | 1.00 | 13 | 32 | 0 | 4760 | 0 |
| public_data_source | 0 | 1.00 | 84 | 102 | 0 | 2 | 0 |
| converted | 0 | 1.00 | 53 | 53 | 0 | 1 | 0 |
| created_timestamp | 0 | 1.00 | 22 | 22 | 0 | 2 | 0 |
| updated_timestamp | 0 | 1.00 | 22 | 22 | 0 | 2 | 0 |
| Geometry | 0 | 1.00 | 33 | 37 | 0 | 4760 | 0 |
| ADM2_EN | 0 | 1.00 | 3 | 14 | 0 | 30 | 0 |
| ADM2_PCODE | 0 | 1.00 | 8 | 8 | 0 | 30 | 0 |
| ADM1_EN | 0 | 1.00 | 4 | 4 | 0 | 1 | 0 |
| ADM1_PCODE | 0 | 1.00 | 5 | 5 | 0 | 1 | 0 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| rehab_year | 4760 | 0 | NaN | : |
| rehabilitator | 4760 | 0 | NaN | : |
| is_urban | 0 | 1 | 0.39 | FAL: 2884, TRU: 1876 |
| latest_record | 0 | 1 | 1.00 | TRU: 4760 |
| status | 0 | 1 | 0.56 | TRU: 2642, FAL: 2118 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| row_id | 0 | 1.00 | 68550.48 | 10216.94 | 49601.00 | 66874.75 | 68244.50 | 69562.25 | 471319.00 | ▇▁▁▁▁ |
| lat_deg | 0 | 1.00 | 7.68 | 0.22 | 7.06 | 7.51 | 7.71 | 7.88 | 8.06 | ▁▂▇▇▇ |
| lon_deg | 0 | 1.00 | 4.54 | 0.21 | 4.08 | 4.36 | 4.56 | 4.71 | 5.06 | ▃▆▇▇▂ |
| install_year | 1144 | 0.76 | 2008.63 | 6.04 | 1917.00 | 2006.00 | 2010.00 | 2013.00 | 2015.00 | ▁▁▁▁▇ |
| fecal_coliform_value | 4760 | 0.00 | NaN | NA | NA | NA | NA | NA | NA | |
| distance_to_primary_road | 0 | 1.00 | 5021.53 | 5648.34 | 0.01 | 719.36 | 2972.78 | 7314.73 | 26909.86 | ▇▂▁▁▁ |
| distance_to_secondary_road | 0 | 1.00 | 3750.47 | 3938.63 | 0.15 | 460.90 | 2554.25 | 5791.94 | 19559.48 | ▇▃▁▁▁ |
| distance_to_tertiary_road | 0 | 1.00 | 1259.28 | 1680.04 | 0.02 | 121.25 | 521.77 | 1834.42 | 10966.27 | ▇▂▁▁▁ |
| distance_to_city | 0 | 1.00 | 16663.99 | 10960.82 | 53.05 | 7930.75 | 15030.41 | 24255.75 | 47934.34 | ▇▇▆▃▁ |
| distance_to_town | 0 | 1.00 | 16726.59 | 12452.65 | 30.00 | 6876.92 | 12204.53 | 27739.46 | 44020.64 | ▇▅▃▃▂ |
| rehab_priority | 2654 | 0.44 | 489.33 | 1658.81 | 0.00 | 7.00 | 91.50 | 376.25 | 29697.00 | ▇▁▁▁▁ |
| water_point_population | 4 | 1.00 | 513.58 | 1458.92 | 0.00 | 14.00 | 119.00 | 433.25 | 29697.00 | ▇▁▁▁▁ |
| local_population_1km | 4 | 1.00 | 2727.16 | 4189.46 | 0.00 | 176.00 | 1032.00 | 3717.00 | 36118.00 | ▇▁▁▁▁ |
| crucialness_score | 798 | 0.83 | 0.26 | 0.28 | 0.00 | 0.07 | 0.15 | 0.35 | 1.00 | ▇▃▁▁▁ |
| pressure_score | 798 | 0.83 | 1.46 | 4.16 | 0.00 | 0.12 | 0.41 | 1.24 | 93.69 | ▇▁▁▁▁ |
| usage_capacity | 0 | 1.00 | 560.74 | 338.46 | 300.00 | 300.00 | 300.00 | 1000.00 | 1000.00 | ▇▁▁▁▅ |
| days_since_report | 0 | 1.00 | 2692.69 | 41.92 | 1483.00 | 2688.00 | 2693.00 | 2700.00 | 4645.00 | ▁▇▁▁▁ |
| staleness_score | 0 | 1.00 | 42.80 | 0.58 | 23.13 | 42.70 | 42.79 | 42.86 | 62.66 | ▁▁▇▁▁ |
| location_id | 0 | 1.00 | 235865.49 | 6657.60 | 23741.00 | 230638.75 | 236199.50 | 240061.25 | 267454.00 | ▁▁▁▁▇ |
| cluster_size | 0 | 1.00 | 1.05 | 0.25 | 1.00 | 1.00 | 1.00 | 1.00 | 4.00 | ▇▁▁▁▁ |
| lat_deg_original | 4760 | 0.00 | NaN | NA | NA | NA | NA | NA | NA | |
| lon_deg_original | 4760 | 0.00 | NaN | NA | NA | NA | NA | NA | NA | |
| count | 0 | 1.00 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
With above statistics reports, we noticed in variable install_year, there are 25% are missing values. To impute the missing value with zero may introduce bias to the dataset hence we need to drop this variable before feed into the modeling.
The following variables are selected for modelling. To ensure there is no missing values in the dataset, we use all_var(!is.na(.)) to exclude the missing values.
Factors are used to represent categorical data. Here we will recode usage_capacity as categorical variables since it has only 3 types of values.
Osun_wp_sf_clean <- Osun_wp_sf %>%
filter_at(vars(status,
distance_to_primary_road,
distance_to_secondary_road,
distance_to_tertiary_road,
distance_to_city,
distance_to_town,
water_point_population,
local_population_1km,
usage_capacity,
is_urban,
water_source_clean),
all_vars(!is.na(.)))%>%
mutate(usage_capacity = as.factor(usage_capacity))Correlation Analysis
We will extract the desired variables from Osun_wp_sf_clean dataframe and assign to Osun_wp dataframe, also exclude st_set_geometry for following correlation matrix plot setp.
Osun_wp <- Osun_wp_sf_clean %>%
select(c(7,35:39,42:43,46:47,57)) %>%
st_set_geometry(NULL)cluster_vars.cor = cor(
Osun_wp[,2:7])
corrplot.mixed(cluster_vars.cor,
lower = "ellipse",
upper = "number",
tl.pos = "lt",
diag = "l",
tl.col = "black")
Noticed all variables we selected are not highly correlated with each other. Therefore we do not need to remove any of them.
Building a Logistic Regression Models
We will use below code chunk to fit generalized linear model, the function is glm.
model <- glm(status ~ distance_to_primary_road +
distance_to_secondary_road +
distance_to_tertiary_road +
distance_to_city +
distance_to_town +
is_urban +
usage_capacity +
water_source_clean +
water_point_population +
local_population_1km,
data = Osun_wp_sf_clean,
family = binomial(link = 'logit'))blr_regress is Binary logistic regression.
blr_regress(model) Model Overview
------------------------------------------------------------------------
Data Set Resp Var Obs. Df. Model Df. Residual Convergence
------------------------------------------------------------------------
data status 4756 4755 4744 TRUE
------------------------------------------------------------------------
Response Summary
--------------------------------------------------------
Outcome Frequency Outcome Frequency
--------------------------------------------------------
0 2114 1 2642
--------------------------------------------------------
Maximum Likelihood Estimates
-----------------------------------------------------------------------------------------------
Parameter DF Estimate Std. Error z value Pr(>|z|)
-----------------------------------------------------------------------------------------------
(Intercept) 1 0.3887 0.1124 3.4588 5e-04
distance_to_primary_road 1 0.0000 0.0000 -0.7153 0.4744
distance_to_secondary_road 1 0.0000 0.0000 -0.5530 0.5802
distance_to_tertiary_road 1 1e-04 0.0000 4.6708 0.0000
distance_to_city 1 0.0000 0.0000 -4.7574 0.0000
distance_to_town 1 0.0000 0.0000 -4.9170 0.0000
is_urbanTRUE 1 -0.2971 0.0819 -3.6294 3e-04
usage_capacity1000 1 -0.6230 0.0697 -8.9366 0.0000
water_source_cleanProtected Shallow Well 1 0.5040 0.0857 5.8783 0.0000
water_source_cleanProtected Spring 1 1.2882 0.4388 2.9359 0.0033
water_point_population 1 -5e-04 0.0000 -11.3686 0.0000
local_population_1km 1 3e-04 0.0000 19.2953 0.0000
-----------------------------------------------------------------------------------------------
Association of Predicted Probabilities and Observed Responses
---------------------------------------------------------------
% Concordant 0.7347 Somers' D 0.4693
% Discordant 0.2653 Gamma 0.4693
% Tied 0.0000 Tau-a 0.2318
Pairs 5585188 c 0.7347
---------------------------------------------------------------
The report above shown that P-value for variable distance_to_primary_road and distance_to_secondary_road are not statistically significant at 95% Confidence Level.
report(model)We fitted a logistic model (estimated using ML) to predict status with
distance_to_primary_road (formula: status ~ distance_to_primary_road +
distance_to_secondary_road + distance_to_tertiary_road + distance_to_city +
distance_to_town + is_urban + usage_capacity + water_source_clean +
water_point_population + local_population_1km). The model's explanatory power
is moderate (Tjur's R2 = 0.16). The model's intercept, corresponding to
distance_to_primary_road = 0, is at 0.39 (95% CI [0.17, 0.61], p < .001).
Within this model:
- The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
- The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
- The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
- The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_secondary_road
(formula: status ~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_secondary_road = 0,
is at 0.39 (95% CI [0.17, 0.61], p < .001). Within this model:
- The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
- The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
- The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
- The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_tertiary_road (formula:
status ~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_tertiary_road = 0,
is at 0.39 (95% CI [0.17, 0.61], p < .001). Within this model:
- The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
- The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
- The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
- The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_city (formula: status ~
distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_city = 0, is at 0.39
(95% CI [0.17, 0.61], p < .001). Within this model:
- The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
- The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
- The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
- The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_town (formula: status ~
distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_town = 0, is at 0.39
(95% CI [0.17, 0.61], p < .001). Within this model:
- The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
- The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
- The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
- The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with is_urban (formula: status ~
distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to is_urban = [?], is at 0.39 (95%
CI [0.17, 0.61], p < .001). Within this model:
- The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
- The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
- The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
- The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with usage_capacity (formula: status ~
distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to usage_capacity = 300, is at 0.39
(95% CI [0.17, 0.61], p < .001). Within this model:
- The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
- The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
- The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
- The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with water_source_clean (formula: status
~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to water_source_clean = Borehole,
is at 0.39 (95% CI [0.17, 0.61], p < .001). Within this model:
- The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
- The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
- The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
- The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with water_point_population (formula:
status ~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to water_point_population = 0, is
at 0.39 (95% CI [0.17, 0.61], p < .001). Within this model:
- The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
- The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
- The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
- The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation. and We fitted a logistic
model (estimated using ML) to predict status with local_population_1km
(formula: status ~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to local_population_1km = 0, is at
0.39 (95% CI [0.17, 0.61], p < .001). Within this model:
- The effect of distance to primary road is statistically non-significant and
negative (beta = -4.64e-06, 95% CI [-1.74e-05, 8.07e-06], p = 0.474; Std. beta
= -0.03, 95% CI [-0.10, 0.05])
- The effect of distance to secondary road is statistically non-significant and
negative (beta = -5.14e-06, 95% CI [-2.34e-05, 1.31e-05], p = 0.580; Std. beta
= -0.02, 95% CI [-0.09, 0.05])
- The effect of distance to tertiary road is statistically significant and
positive (beta = 9.68e-05, 95% CI [5.64e-05, 1.38e-04], p < .001; Std. beta =
0.16, 95% CI [0.09, 0.23])
- The effect of distance to city is statistically significant and negative
(beta = -1.69e-05, 95% CI [-2.38e-05, -9.92e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of distance to town is statistically significant and negative
(beta = -1.48e-05, 95% CI [-2.07e-05, -8.91e-06], p < .001; Std. beta = -0.18,
95% CI [-0.26, -0.11])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.30, 95% CI [-0.46, -0.14], p < .001; Std. beta = -0.30, 95% CI [-0.46,
-0.14])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.49], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.49])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.50, 95% CI [0.34, 0.67], p < .001; Std. beta
= 0.50, 95% CI [0.34, 0.67])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.29, 95% CI [0.48, 2.23], p = 0.003; Std.
beta = 1.29, 95% CI [0.48, 2.23])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation.
Below code chunk use blr_confusion_matrix to generate the binary logistic regression confusion matrix.
blr_confusion_matrix(model, cutoff = 0.5)Confusion Matrix and Statistics
Reference
Prediction FALSE TRUE
0 1301 738
1 813 1904
Accuracy : 0.6739
No Information Rate : 0.4445
Kappa : 0.3373
McNemars's Test P-Value : 0.0602
Sensitivity : 0.7207
Specificity : 0.6154
Pos Pred Value : 0.7008
Neg Pred Value : 0.6381
Prevalence : 0.5555
Detection Rate : 0.4003
Detection Prevalence : 0.5713
Balanced Accuracy : 0.6680
Precision : 0.7008
Recall : 0.7207
'Positive' Class : 1
The contingency table has 4 data cells:
1. Actual 0 Predicted 0 – The number of cases that were both predicted and
observed as 0. The records in this cell are referred to as true negatives.
The model classification was correct for these records.
2. Actual 0 Predicted 1 – The number of cases that were predicted as 1 yet
observed as 0. The records in this cell are referred to as false positives. The
model classification was incorrect for these records.
3. Actual 1 Predicted 1 – The number of cases that were both predicted and
observed as 1. The records in this cell are referred to as true positives. The
model classification was correct for these records.
4. Actual 1 Predicted 0 – The number of cases that were predicted as 0 yet
observed as 1. The records in this cell are referred to as false negatives. The
model classification was incorrect for these records.

The overall accuracy is the sum of the number of correctly classified observations
divided by the total number of observations. This not particularly useful in statistics point of view.
Sensitivity is also known as true positive rate or recall. It answers the question, “If
the model predicts a positive event, what is the probability that it really is positive?”
Specificity is the true negative rate. It answer the question, “If the model predicts a
negative event, what is the probability that it really is negative?”.
The false positive rate can be also defined as 1-specificity.
From above confusion matrix, the true positive (sensitivity) is 0.7207 and true negative (specificity) is 0.6154 which is slightly lower than true positive, it can be interpret as functional water point rate is slightly higher than non-functional wate point rate.
Geographically Weighted Regression Model
Converting sf to Spatial Point Dataframe
The code chunk below convert sf dataframe to Spatial Points Dataframe.
Osun_wp_sp <- Osun_wp_sf_clean %>%
select(c(status,
distance_to_primary_road,
distance_to_secondary_road,
distance_to_tertiary_road,
distance_to_city,
distance_to_town,
water_point_population,
local_population_1km,
is_urban,
usage_capacity,
water_source_clean)) %>%
as_Spatial()
Osun_wp_spclass : SpatialPointsDataFrame
features : 4756
extent : 182502.4, 290751, 340054.1, 450905.3 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 11
names : status, distance_to_primary_road, distance_to_secondary_road, distance_to_tertiary_road, distance_to_city, distance_to_town, water_point_population, local_population_1km, is_urban, usage_capacity, water_source_clean
min values : 0, 0.014461356813335, 0.152195902540837, 0.017815121653488, 53.0461399623541, 30.0019777713073, 0, 0, 0, 1000, Borehole
max values : 1, 26909.8616132094, 19559.4793799085, 10966.2705628969, 47934.343603562, 44020.6393368124, 29697, 36118, 1, 300, Protected Spring
Computing Fixed Bandwidth
bw.fixed <- bw.ggwr(status ~ distance_to_primary_road +
distance_to_secondary_road +
distance_to_tertiary_road +
distance_to_city +
distance_to_town +
is_urban +
usage_capacity +
water_source_clean +
water_point_population +
local_population_1km,
data = Osun_wp_sp,
family = "binomial",
approach = "AIC",
kernel = "gaussian",
adaptive = FALSE,
longlat = FALSE)Take a cup of tea and have a break, it will take a few minutes.
-----A kind suggestion from GWmodel development group
Iteration Log-Likelihood:(With bandwidth: 95768.67 )
=========================
0 -2889
1 -2836
2 -2830
3 -2829
4 -2829
5 -2829
Fixed bandwidth: 95768.67 AICc value: 5684.357
Iteration Log-Likelihood:(With bandwidth: 59200.13 )
=========================
0 -2875
1 -2818
2 -2810
3 -2808
4 -2808
5 -2808
Fixed bandwidth: 59200.13 AICc value: 5646.785
Iteration Log-Likelihood:(With bandwidth: 36599.53 )
=========================
0 -2847
1 -2781
2 -2768
3 -2765
4 -2765
5 -2765
6 -2765
Fixed bandwidth: 36599.53 AICc value: 5575.148
Iteration Log-Likelihood:(With bandwidth: 22631.59 )
=========================
0 -2798
1 -2719
2 -2698
3 -2693
4 -2693
5 -2693
6 -2693
Fixed bandwidth: 22631.59 AICc value: 5466.883
Iteration Log-Likelihood:(With bandwidth: 13998.93 )
=========================
0 -2720
1 -2622
2 -2590
3 -2581
4 -2580
5 -2580
6 -2580
7 -2580
Fixed bandwidth: 13998.93 AICc value: 5324.578
Iteration Log-Likelihood:(With bandwidth: 8663.649 )
=========================
0 -2601
1 -2476
2 -2431
3 -2419
4 -2417
5 -2417
6 -2417
7 -2417
Fixed bandwidth: 8663.649 AICc value: 5163.61
Iteration Log-Likelihood:(With bandwidth: 5366.266 )
=========================
0 -2436
1 -2268
2 -2194
3 -2167
4 -2161
5 -2161
6 -2161
7 -2161
8 -2161
9 -2161
Fixed bandwidth: 5366.266 AICc value: 4990.587
Iteration Log-Likelihood:(With bandwidth: 3328.371 )
=========================
0 -2157
1 -1922
2 -1802
3 -1739
4 -1713
5 -1713
Fixed bandwidth: 3328.371 AICc value: 4798.288
Iteration Log-Likelihood:(With bandwidth: 2068.882 )
=========================
0 -1751
1 -1421
2 -1238
3 -1133
4 -1084
5 -1084
Fixed bandwidth: 2068.882 AICc value: 4837.017
Iteration Log-Likelihood:(With bandwidth: 4106.777 )
=========================
0 -2297
1 -2095
2 -1997
3 -1951
4 -1938
5 -1936
6 -1936
7 -1936
8 -1936
Fixed bandwidth: 4106.777 AICc value: 4873.161
Iteration Log-Likelihood:(With bandwidth: 2847.289 )
=========================
0 -2036
1 -1771
2 -1633
3 -1558
4 -1525
5 -1525
Fixed bandwidth: 2847.289 AICc value: 4768.192
Iteration Log-Likelihood:(With bandwidth: 2549.964 )
=========================
0 -1941
1 -1655
2 -1503
3 -1417
4 -1378
5 -1378
Fixed bandwidth: 2549.964 AICc value: 4762.212
Iteration Log-Likelihood:(With bandwidth: 2366.207 )
=========================
0 -1874
1 -1573
2 -1410
3 -1316
4 -1274
5 -1274
Fixed bandwidth: 2366.207 AICc value: 4773.081
Iteration Log-Likelihood:(With bandwidth: 2663.532 )
=========================
0 -1979
1 -1702
2 -1555
3 -1474
4 -1438
5 -1438
Fixed bandwidth: 2663.532 AICc value: 4762.568
Iteration Log-Likelihood:(With bandwidth: 2479.775 )
=========================
0 -1917
1 -1625
2 -1468
3 -1380
4 -1339
5 -1339
Fixed bandwidth: 2479.775 AICc value: 4764.294
Iteration Log-Likelihood:(With bandwidth: 2593.343 )
=========================
0 -1956
1 -1674
2 -1523
3 -1439
4 -1401
5 -1401
Fixed bandwidth: 2593.343 AICc value: 4761.813
Iteration Log-Likelihood:(With bandwidth: 2620.153 )
=========================
0 -1965
1 -1685
2 -1536
3 -1453
4 -1415
5 -1415
Fixed bandwidth: 2620.153 AICc value: 4761.89
Iteration Log-Likelihood:(With bandwidth: 2576.774 )
=========================
0 -1950
1 -1667
2 -1515
3 -1431
4 -1393
5 -1393
Fixed bandwidth: 2576.774 AICc value: 4761.889
Iteration Log-Likelihood:(With bandwidth: 2603.584 )
=========================
0 -1960
1 -1678
2 -1528
3 -1445
4 -1407
5 -1407
Fixed bandwidth: 2603.584 AICc value: 4761.813
Iteration Log-Likelihood:(With bandwidth: 2609.913 )
=========================
0 -1962
1 -1680
2 -1531
3 -1448
4 -1410
5 -1410
Fixed bandwidth: 2609.913 AICc value: 4761.831
Iteration Log-Likelihood:(With bandwidth: 2599.672 )
=========================
0 -1958
1 -1676
2 -1526
3 -1443
4 -1405
5 -1405
Fixed bandwidth: 2599.672 AICc value: 4761.809
Iteration Log-Likelihood:(With bandwidth: 2597.255 )
=========================
0 -1957
1 -1675
2 -1525
3 -1441
4 -1403
5 -1403
Fixed bandwidth: 2597.255 AICc value: 4761.809
bw.fixed[1] 2599.672
From the above, we can see that the bandwidth calculated is 2599.672.
Building GWR Model (Fixed Bandwidth)
gwlr.fixed <- ggwr.basic(status ~
distance_to_primary_road +
distance_to_secondary_road +
distance_to_tertiary_road +
distance_to_city +
distance_to_town +
water_point_population +
local_population_1km +
is_urban +
usage_capacity +
water_source_clean,
data=Osun_wp_sp,
bw = bw.fixed,
family = "binomial",
kernel = "gaussian",
adaptive = FALSE,
longlat = FALSE) Iteration Log-Likelihood
=========================
0 -1958
1 -1676
2 -1526
3 -1443
4 -1405
5 -1405
gwlr.fixed ***********************************************************************
* Package GWmodel *
***********************************************************************
Program starts at: 2022-12-17 23:15:22
Call:
ggwr.basic(formula = status ~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town +
water_point_population + local_population_1km + is_urban +
usage_capacity + water_source_clean, data = Osun_wp_sp, bw = bw.fixed,
family = "binomial", kernel = "gaussian", adaptive = FALSE,
longlat = FALSE)
Dependent (y) variable: status
Independent variables: distance_to_primary_road distance_to_secondary_road distance_to_tertiary_road distance_to_city distance_to_town water_point_population local_population_1km is_urban usage_capacity water_source_clean
Number of data points: 4756
Used family: binomial
***********************************************************************
* Results of Generalized linear Regression *
***********************************************************************
Call:
NULL
Deviance Residuals:
Min 1Q Median 3Q Max
-124.555 -1.755 1.072 1.742 34.333
Coefficients:
Estimate Std. Error z value Pr(>|z|)
Intercept 3.887e-01 1.124e-01 3.459 0.000543
distance_to_primary_road -4.642e-06 6.490e-06 -0.715 0.474422
distance_to_secondary_road -5.143e-06 9.299e-06 -0.553 0.580230
distance_to_tertiary_road 9.683e-05 2.073e-05 4.671 3.00e-06
distance_to_city -1.686e-05 3.544e-06 -4.757 1.96e-06
distance_to_town -1.480e-05 3.009e-06 -4.917 8.79e-07
water_point_population -5.097e-04 4.484e-05 -11.369 < 2e-16
local_population_1km 3.451e-04 1.788e-05 19.295 < 2e-16
is_urbanTRUE -2.971e-01 8.185e-02 -3.629 0.000284
usage_capacity1000 -6.230e-01 6.972e-02 -8.937 < 2e-16
water_source_cleanProtected Shallow Well 5.040e-01 8.574e-02 5.878 4.14e-09
water_source_cleanProtected Spring 1.288e+00 4.388e-01 2.936 0.003325
Intercept ***
distance_to_primary_road
distance_to_secondary_road
distance_to_tertiary_road ***
distance_to_city ***
distance_to_town ***
water_point_population ***
local_population_1km ***
is_urbanTRUE ***
usage_capacity1000 ***
water_source_cleanProtected Shallow Well ***
water_source_cleanProtected Spring **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6534.5 on 4755 degrees of freedom
Residual deviance: 5688.0 on 4744 degrees of freedom
AIC: 5712
Number of Fisher Scoring iterations: 5
AICc: 5712.099
Pseudo R-square value: 0.1295351
***********************************************************************
* Results of Geographically Weighted Regression *
***********************************************************************
*********************Model calibration information*********************
Kernel function: gaussian
Fixed bandwidth: 2599.672
Regression points: the same locations as observations are used.
Distance metric: A distance matrix is specified for this model calibration.
************Summary of Generalized GWR coefficient estimates:**********
Min. 1st Qu. Median
Intercept -8.7228e+02 -4.9955e+00 1.7600e+00
distance_to_primary_road -1.9389e-02 -4.8031e-04 2.9618e-05
distance_to_secondary_road -1.5921e-02 -3.7551e-04 1.2317e-04
distance_to_tertiary_road -1.5618e-02 -4.2368e-04 7.6179e-05
distance_to_city -1.8416e-02 -5.6217e-04 -1.2726e-04
distance_to_town -2.2411e-02 -5.7283e-04 -1.5155e-04
water_point_population -5.2208e-02 -2.2767e-03 -9.8875e-04
local_population_1km -1.2698e-01 4.9952e-04 1.0638e-03
is_urbanTRUE -1.9790e+02 -4.2908e+00 -1.6864e+00
usage_capacity1000 -2.0772e+01 -9.7231e-01 -4.1592e-01
water_source_cleanProtected.Shallow.Well -2.0789e+01 -4.5190e-01 5.3340e-01
water_source_cleanProtected.Spring -5.2235e+02 -5.5977e+00 2.5441e+00
3rd Qu. Max.
Intercept 1.2763e+01 1073.2154
distance_to_primary_road 4.8443e-04 0.0142
distance_to_secondary_road 6.0692e-04 0.0258
distance_to_tertiary_road 6.6814e-04 0.0128
distance_to_city 2.3718e-04 0.0150
distance_to_town 1.9271e-04 0.0224
water_point_population 5.0102e-04 0.1309
local_population_1km 1.8157e-03 0.0392
is_urbanTRUE 1.2841e+00 744.3097
usage_capacity1000 3.0322e-01 5.9281
water_source_cleanProtected.Shallow.Well 1.7849e+00 67.6343
water_source_cleanProtected.Spring 6.7663e+00 317.4123
************************Diagnostic information*************************
Number of data points: 4756
GW Deviance: 2795.084
AIC : 4414.606
AICc : 4747.423
Pseudo R-square value: 0.5722559
***********************************************************************
Program stops at: 2022-12-17 23:16:05
AIC for Generalized linear Regression model is 5712 compare with AIC of Geographically Weighted Regression which is 4414.606, so we can say Geographically Weighted Regression model performance is better than Generalized linear Regression model.
Converting SDF into sf dataframe
To access the performance of the gwLR, firstly, we will convert the SDF object in as dataframe by using the code chunk below.
gwr.fixed <- as.data.frame(gwlr.fixed$SDF)next, we will label yhat values greater or equal to 0.5 into 1 and else 0. The result of the logic comparison operation will be saved into a field called most.
gwr.fixed <- gwr.fixed %>%
mutate(most = ifelse(
gwr.fixed$yhat >= 0.5, T, F))confusionMatrix is using to show the confusion matrix of GW model using fixed bandwidth.
gwr.fixed$y <- as.factor(gwr.fixed$y)
gwr.fixed$most <- as.factor(gwr.fixed$most)
CM <- confusionMatrix(data=gwr.fixed$most, reference = gwr.fixed$y, positive = "TRUE")
CMConfusion Matrix and Statistics
Reference
Prediction FALSE TRUE
FALSE 1824 263
TRUE 290 2379
Accuracy : 0.8837
95% CI : (0.8743, 0.8927)
No Information Rate : 0.5555
P-Value [Acc > NIR] : <2e-16
Kappa : 0.7642
Mcnemar's Test P-Value : 0.2689
Sensitivity : 0.9005
Specificity : 0.8628
Pos Pred Value : 0.8913
Neg Pred Value : 0.8740
Prevalence : 0.5555
Detection Rate : 0.5002
Detection Prevalence : 0.5612
Balanced Accuracy : 0.8816
'Positive' Class : TRUE
We can see from the CM result, True Positive rate improved for gwLR model.
Visulising gwLR
Osun_wp_sf_selected <- Osun_wp_sf_clean %>%
select(c(ADM2_EN, ADM2_PCODE,
ADM1_EN, ADM1_PCODE,
status))gwr_sf.fixed <- cbind(Osun_wp_sf_selected, gwr.fixed)tmap_mode("view")tmap mode set to interactive viewing
prob_T <- tm_shape(Osun) +
tm_polygons(alpha = 0.1) +
tm_shape(gwr_sf.fixed) +
tm_dots(col = "yhat",
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(8,14))
prob_TBuilding a Logistic Regression Models with removal of Statistically insingnificant variables
In the earlier report of binary logistic regression model, we saw two variables distance_to_primary_road and distance_to_secondary_road are not statistically significant at 95% Confidence Level. Therefore we should exclude these two variables when build the logistic regression model.
Below code chunk repeat above steps for rebuild the model for new fixed bandwidth without above two variables.
model_re <- glm(status ~ distance_to_tertiary_road +
distance_to_city +
distance_to_town +
is_urban +
usage_capacity +
water_source_clean +
water_point_population +
local_population_1km,
data = Osun_wp_sf_clean,
family = binomial(link = 'logit'))
blr_regress(model_re) Model Overview
------------------------------------------------------------------------
Data Set Resp Var Obs. Df. Model Df. Residual Convergence
------------------------------------------------------------------------
data status 4756 4755 4746 TRUE
------------------------------------------------------------------------
Response Summary
--------------------------------------------------------
Outcome Frequency Outcome Frequency
--------------------------------------------------------
0 2114 1 2642
--------------------------------------------------------
Maximum Likelihood Estimates
-----------------------------------------------------------------------------------------------
Parameter DF Estimate Std. Error z value Pr(>|z|)
-----------------------------------------------------------------------------------------------
(Intercept) 1 0.3540 0.1055 3.3541 8e-04
distance_to_tertiary_road 1 1e-04 0.0000 4.9096 0.0000
distance_to_city 1 0.0000 0.0000 -5.2022 0.0000
distance_to_town 1 0.0000 0.0000 -5.4660 0.0000
is_urbanTRUE 1 -0.2667 0.0747 -3.5690 4e-04
usage_capacity1000 1 -0.6206 0.0697 -8.9081 0.0000
water_source_cleanProtected Shallow Well 1 0.4947 0.0850 5.8228 0.0000
water_source_cleanProtected Spring 1 1.2790 0.4384 2.9174 0.0035
water_point_population 1 -5e-04 0.0000 -11.3902 0.0000
local_population_1km 1 3e-04 0.0000 19.4069 0.0000
-----------------------------------------------------------------------------------------------
Association of Predicted Probabilities and Observed Responses
---------------------------------------------------------------
% Concordant 0.7349 Somers' D 0.4697
% Discordant 0.2651 Gamma 0.4697
% Tied 0.0000 Tau-a 0.2320
Pairs 5585188 c 0.7349
---------------------------------------------------------------
report(model_re)We fitted a logistic model (estimated using ML) to predict status with
distance_to_tertiary_road (formula: status ~ distance_to_tertiary_road +
distance_to_city + distance_to_town + is_urban + usage_capacity +
water_source_clean + water_point_population + local_population_1km). The
model's explanatory power is moderate (Tjur's R2 = 0.16). The model's
intercept, corresponding to distance_to_tertiary_road = 0, is at 0.35 (95% CI
[0.15, 0.56], p < .001). Within this model:
- The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
- The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
- The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_city (formula: status ~
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_city = 0, is at 0.35
(95% CI [0.15, 0.56], p < .001). Within this model:
- The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
- The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
- The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with distance_to_town (formula: status ~
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to distance_to_town = 0, is at 0.35
(95% CI [0.15, 0.56], p < .001). Within this model:
- The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
- The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
- The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with is_urban (formula: status ~
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to is_urban = [?], is at 0.35 (95%
CI [0.15, 0.56], p < .001). Within this model:
- The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
- The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
- The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with usage_capacity (formula: status ~
distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to usage_capacity = 300, is at 0.35
(95% CI [0.15, 0.56], p < .001). Within this model:
- The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
- The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
- The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with water_source_clean (formula: status
~ distance_to_tertiary_road + distance_to_city + distance_to_town + is_urban +
usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to water_source_clean = Borehole,
is at 0.35 (95% CI [0.15, 0.56], p < .001). Within this model:
- The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
- The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
- The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation., We fitted a logistic model
(estimated using ML) to predict status with water_point_population (formula:
status ~ distance_to_tertiary_road + distance_to_city + distance_to_town +
is_urban + usage_capacity + water_source_clean + water_point_population +
local_population_1km). The model's explanatory power is moderate (Tjur's R2 =
0.16). The model's intercept, corresponding to water_point_population = 0, is
at 0.35 (95% CI [0.15, 0.56], p < .001). Within this model:
- The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
- The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
- The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation. and We fitted a logistic
model (estimated using ML) to predict status with local_population_1km
(formula: status ~ distance_to_tertiary_road + distance_to_city +
distance_to_town + is_urban + usage_capacity + water_source_clean +
water_point_population + local_population_1km). The model's explanatory power
is moderate (Tjur's R2 = 0.16). The model's intercept, corresponding to
local_population_1km = 0, is at 0.35 (95% CI [0.15, 0.56], p < .001). Within
this model:
- The effect of distance to tertiary road is statistically significant and
positive (beta = 1.00e-04, 95% CI [6.03e-05, 1.40e-04], p < .001; Std. beta =
0.17, 95% CI [0.10, 0.24])
- The effect of distance to city is statistically significant and negative
(beta = -1.76e-05, 95% CI [-2.43e-05, -1.10e-05], p < .001; Std. beta = -0.19,
95% CI [-0.27, -0.12])
- The effect of distance to town is statistically significant and negative
(beta = -1.54e-05, 95% CI [-2.10e-05, -9.91e-06], p < .001; Std. beta = -0.19,
95% CI [-0.26, -0.12])
- The effect of is urbanTRUE is statistically significant and negative (beta =
-0.27, 95% CI [-0.41, -0.12], p < .001; Std. beta = -0.27, 95% CI [-0.41,
-0.12])
- The effect of usage capacity [1000] is statistically significant and negative
(beta = -0.62, 95% CI [-0.76, -0.48], p < .001; Std. beta = -0.62, 95% CI
[-0.76, -0.48])
- The effect of water source clean [Protected Shallow Well] is statistically
significant and positive (beta = 0.49, 95% CI [0.33, 0.66], p < .001; Std. beta
= 0.49, 95% CI [0.33, 0.66])
- The effect of water source clean [Protected Spring] is statistically
significant and positive (beta = 1.28, 95% CI [0.47, 2.22], p = 0.004; Std.
beta = 1.28, 95% CI [0.47, 2.22])
- The effect of water point population is statistically significant and
negative (beta = -5.10e-04, 95% CI [-6.01e-04, -4.26e-04], p < .001; Std. beta
= -0.74, 95% CI [-0.88, -0.62])
- The effect of local population 1km is statistically significant and positive
(beta = 3.45e-04, 95% CI [3.11e-04, 3.81e-04], p < .001; Std. beta = 1.45, 95%
CI [1.30, 1.60])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation.
blr_confusion_matrix(model_re, cutoff = 0.5)Confusion Matrix and Statistics
Reference
Prediction FALSE TRUE
0 1300 743
1 814 1899
Accuracy : 0.6726
No Information Rate : 0.4445
Kappa : 0.3348
McNemars's Test P-Value : 0.0761
Sensitivity : 0.7188
Specificity : 0.6149
Pos Pred Value : 0.7000
Neg Pred Value : 0.6363
Prevalence : 0.5555
Detection Rate : 0.3993
Detection Prevalence : 0.5704
Balanced Accuracy : 0.6669
Precision : 0.7000
Recall : 0.7188
'Positive' Class : 1
Computing Fixed Bandwidth
bw.fixed_re <- bw.ggwr(status ~ distance_to_tertiary_road +
distance_to_city +
distance_to_town +
is_urban +
usage_capacity +
water_source_clean +
water_point_population +
local_population_1km,
data = Osun_wp_sp,
family = "binomial",
approach = "AIC",
kernel = "gaussian",
adaptive = FALSE,
longlat = FALSE)Take a cup of tea and have a break, it will take a few minutes.
-----A kind suggestion from GWmodel development group
Iteration Log-Likelihood:(With bandwidth: 95768.67 )
=========================
0 -2890
1 -2837
2 -2830
3 -2829
4 -2829
5 -2829
Fixed bandwidth: 95768.67 AICc value: 5681.18
Iteration Log-Likelihood:(With bandwidth: 59200.13 )
=========================
0 -2878
1 -2820
2 -2812
3 -2810
4 -2810
5 -2810
Fixed bandwidth: 59200.13 AICc value: 5645.901
Iteration Log-Likelihood:(With bandwidth: 36599.53 )
=========================
0 -2854
1 -2790
2 -2777
3 -2774
4 -2774
5 -2774
6 -2774
Fixed bandwidth: 36599.53 AICc value: 5585.354
Iteration Log-Likelihood:(With bandwidth: 22631.59 )
=========================
0 -2810
1 -2732
2 -2711
3 -2707
4 -2707
5 -2707
6 -2707
Fixed bandwidth: 22631.59 AICc value: 5481.877
Iteration Log-Likelihood:(With bandwidth: 13998.93 )
=========================
0 -2732
1 -2635
2 -2604
3 -2597
4 -2596
5 -2596
6 -2596
Fixed bandwidth: 13998.93 AICc value: 5333.718
Iteration Log-Likelihood:(With bandwidth: 8663.649 )
=========================
0 -2624
1 -2502
2 -2459
3 -2447
4 -2446
5 -2446
6 -2446
7 -2446
Fixed bandwidth: 8663.649 AICc value: 5178.493
Iteration Log-Likelihood:(With bandwidth: 5366.266 )
=========================
0 -2478
1 -2319
2 -2250
3 -2225
4 -2219
5 -2219
6 -2220
7 -2220
8 -2220
9 -2220
Fixed bandwidth: 5366.266 AICc value: 5022.016
Iteration Log-Likelihood:(With bandwidth: 3328.371 )
=========================
0 -2222
1 -2002
2 -1894
3 -1838
4 -1818
5 -1814
6 -1814
Fixed bandwidth: 3328.371 AICc value: 4827.587
Iteration Log-Likelihood:(With bandwidth: 2068.882 )
=========================
0 -1837
1 -1528
2 -1357
3 -1261
4 -1222
5 -1222
Fixed bandwidth: 2068.882 AICc value: 4772.046
Iteration Log-Likelihood:(With bandwidth: 1290.476 )
=========================
0 -1403
1 -1016
2 -807.3
3 -680.2
4 -680.2
Fixed bandwidth: 1290.476 AICc value: 5809.719
Iteration Log-Likelihood:(With bandwidth: 2549.964 )
=========================
0 -2019
1 -1753
2 -1614
3 -1538
4 -1506
5 -1506
Fixed bandwidth: 2549.964 AICc value: 4764.056
Iteration Log-Likelihood:(With bandwidth: 2847.289 )
=========================
0 -2108
1 -1862
2 -1736
3 -1670
4 -1644
5 -1644
Fixed bandwidth: 2847.289 AICc value: 4791.834
Iteration Log-Likelihood:(With bandwidth: 2366.207 )
=========================
0 -1955
1 -1675
2 -1525
3 -1441
4 -1407
5 -1407
Fixed bandwidth: 2366.207 AICc value: 4755.524
Iteration Log-Likelihood:(With bandwidth: 2252.639 )
=========================
0 -1913
1 -1623
2 -1465
3 -1376
4 -1341
5 -1341
Fixed bandwidth: 2252.639 AICc value: 4759.188
Iteration Log-Likelihood:(With bandwidth: 2436.396 )
=========================
0 -1980
1 -1706
2 -1560
3 -1479
4 -1446
5 -1446
Fixed bandwidth: 2436.396 AICc value: 4756.675
Iteration Log-Likelihood:(With bandwidth: 2322.828 )
=========================
0 -1940
1 -1656
2 -1503
3 -1417
4 -1382
5 -1382
Fixed bandwidth: 2322.828 AICc value: 4756.471
Iteration Log-Likelihood:(With bandwidth: 2393.017 )
=========================
0 -1965
1 -1687
2 -1539
3 -1456
4 -1422
5 -1422
Fixed bandwidth: 2393.017 AICc value: 4755.57
Iteration Log-Likelihood:(With bandwidth: 2349.638 )
=========================
0 -1949
1 -1668
2 -1517
3 -1432
4 -1398
5 -1398
Fixed bandwidth: 2349.638 AICc value: 4755.753
Iteration Log-Likelihood:(With bandwidth: 2376.448 )
=========================
0 -1959
1 -1680
2 -1530
3 -1447
4 -1413
5 -1413
Fixed bandwidth: 2376.448 AICc value: 4755.48
Iteration Log-Likelihood:(With bandwidth: 2382.777 )
=========================
0 -1961
1 -1683
2 -1534
3 -1450
4 -1416
5 -1416
Fixed bandwidth: 2382.777 AICc value: 4755.491
Iteration Log-Likelihood:(With bandwidth: 2372.536 )
=========================
0 -1958
1 -1678
2 -1528
3 -1445
4 -1411
5 -1411
Fixed bandwidth: 2372.536 AICc value: 4755.488
Iteration Log-Likelihood:(With bandwidth: 2378.865 )
=========================
0 -1960
1 -1681
2 -1532
3 -1448
4 -1414
5 -1414
Fixed bandwidth: 2378.865 AICc value: 4755.481
Iteration Log-Likelihood:(With bandwidth: 2374.954 )
=========================
0 -1959
1 -1679
2 -1530
3 -1446
4 -1412
5 -1412
Fixed bandwidth: 2374.954 AICc value: 4755.482
Iteration Log-Likelihood:(With bandwidth: 2377.371 )
=========================
0 -1959
1 -1680
2 -1531
3 -1447
4 -1413
5 -1413
Fixed bandwidth: 2377.371 AICc value: 4755.48
Iteration Log-Likelihood:(With bandwidth: 2377.942 )
=========================
0 -1960
1 -1680
2 -1531
3 -1448
4 -1414
5 -1414
Fixed bandwidth: 2377.942 AICc value: 4755.48
Iteration Log-Likelihood:(With bandwidth: 2377.018 )
=========================
0 -1959
1 -1680
2 -1531
3 -1447
4 -1413
5 -1413
Fixed bandwidth: 2377.018 AICc value: 4755.48
bw.fixed_re[1] 2377.371
From the above, we can see that the new bandwidth calculated is 2377.371.
Building GWR Model (Fixed Bandwidth)
gwlr.fixed_re <- ggwr.basic(status ~distance_to_tertiary_road +
distance_to_city +
distance_to_town +
water_point_population +
local_population_1km +
is_urban +
usage_capacity +
water_source_clean,
data=Osun_wp_sp,
bw = bw.fixed_re,
family = "binomial",
kernel = "gaussian",
adaptive = FALSE,
longlat = FALSE) Iteration Log-Likelihood
=========================
0 -1959
1 -1680
2 -1531
3 -1447
4 -1413
5 -1413
gwlr.fixed_re ***********************************************************************
* Package GWmodel *
***********************************************************************
Program starts at: 2022-12-17 23:27:11
Call:
ggwr.basic(formula = status ~ distance_to_tertiary_road + distance_to_city +
distance_to_town + water_point_population + local_population_1km +
is_urban + usage_capacity + water_source_clean, data = Osun_wp_sp,
bw = bw.fixed_re, family = "binomial", kernel = "gaussian",
adaptive = FALSE, longlat = FALSE)
Dependent (y) variable: status
Independent variables: distance_to_tertiary_road distance_to_city distance_to_town water_point_population local_population_1km is_urban usage_capacity water_source_clean
Number of data points: 4756
Used family: binomial
***********************************************************************
* Results of Generalized linear Regression *
***********************************************************************
Call:
NULL
Deviance Residuals:
Min 1Q Median 3Q Max
-129.368 -1.750 1.074 1.742 34.126
Coefficients:
Estimate Std. Error z value Pr(>|z|)
Intercept 3.540e-01 1.055e-01 3.354 0.000796
distance_to_tertiary_road 1.001e-04 2.040e-05 4.910 9.13e-07
distance_to_city -1.764e-05 3.391e-06 -5.202 1.97e-07
distance_to_town -1.544e-05 2.825e-06 -5.466 4.60e-08
water_point_population -5.098e-04 4.476e-05 -11.390 < 2e-16
local_population_1km 3.452e-04 1.779e-05 19.407 < 2e-16
is_urbanTRUE -2.667e-01 7.474e-02 -3.569 0.000358
usage_capacity1000 -6.206e-01 6.966e-02 -8.908 < 2e-16
water_source_cleanProtected Shallow Well 4.947e-01 8.496e-02 5.823 5.79e-09
water_source_cleanProtected Spring 1.279e+00 4.384e-01 2.917 0.003530
Intercept ***
distance_to_tertiary_road ***
distance_to_city ***
distance_to_town ***
water_point_population ***
local_population_1km ***
is_urbanTRUE ***
usage_capacity1000 ***
water_source_cleanProtected Shallow Well ***
water_source_cleanProtected Spring **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6534.5 on 4755 degrees of freedom
Residual deviance: 5688.9 on 4746 degrees of freedom
AIC: 5708.9
Number of Fisher Scoring iterations: 5
AICc: 5708.923
Pseudo R-square value: 0.129406
***********************************************************************
* Results of Geographically Weighted Regression *
***********************************************************************
*********************Model calibration information*********************
Kernel function: gaussian
Fixed bandwidth: 2377.371
Regression points: the same locations as observations are used.
Distance metric: A distance matrix is specified for this model calibration.
************Summary of Generalized GWR coefficient estimates:**********
Min. 1st Qu. Median
Intercept -3.7021e+02 -4.3797e+00 3.5590e+00
distance_to_tertiary_road -3.1622e-02 -4.5462e-04 9.1291e-05
distance_to_city -5.4555e-02 -6.5623e-04 -1.3507e-04
distance_to_town -8.6549e-03 -5.2754e-04 -1.6785e-04
water_point_population -2.9696e-02 -2.2705e-03 -1.2277e-03
local_population_1km -7.7730e-02 4.4281e-04 1.0548e-03
is_urbanTRUE -7.3554e+02 -3.4675e+00 -1.6596e+00
usage_capacity1000 -5.5889e+01 -1.0347e+00 -4.1960e-01
water_source_cleanProtected.Shallow.Well -1.8842e+02 -4.7295e-01 6.2378e-01
water_source_cleanProtected.Spring -1.3630e+03 -5.3436e+00 2.7714e+00
3rd Qu. Max.
Intercept 1.3755e+01 2171.6373
distance_to_tertiary_road 6.3011e-04 0.0237
distance_to_city 1.5921e-04 0.0162
distance_to_town 2.4490e-04 0.0179
water_point_population 4.5879e-04 0.0765
local_population_1km 1.8479e-03 0.0333
is_urbanTRUE 1.0554e+00 995.1840
usage_capacity1000 3.9113e-01 9.2449
water_source_cleanProtected.Shallow.Well 1.9564e+00 66.8914
water_source_cleanProtected.Spring 7.0805e+00 208.3749
************************Diagnostic information*************************
Number of data points: 4756
GW Deviance: 2815.659
AIC : 4418.776
AICc : 4744.213
Pseudo R-square value: 0.5691072
***********************************************************************
Program stops at: 2022-12-17 23:28:00
AIC for Generalized linear Regression model is 5708.923 compare with AIC of Geographically Weighted Regression which is 4418.776, so we can say Geographically Weighted Regression model performance is better than Generalized linear Regression model.
Converting SDF into sf dataframe
To access the performance of the gwLR, firstly, we will convert the SDF object in as dataframe by using the code chunk below.
gwr.fixed_re <- as.data.frame(gwlr.fixed_re$SDF)gwr.fixed_re <- gwr.fixed_re %>%
mutate(most = ifelse(
gwr.fixed_re$yhat >= 0.5, T, F))gwr.fixed_re$y <- as.factor(gwr.fixed_re$y)
gwr.fixed_re$most <- as.factor(gwr.fixed_re$most)
CM_re <- confusionMatrix(data=gwr.fixed_re$most, reference = gwr.fixed_re$y, positive = "TRUE")
CM_reConfusion Matrix and Statistics
Reference
Prediction FALSE TRUE
FALSE 1833 268
TRUE 281 2374
Accuracy : 0.8846
95% CI : (0.8751, 0.8935)
No Information Rate : 0.5555
P-Value [Acc > NIR] : <2e-16
Kappa : 0.7661
Mcnemar's Test P-Value : 0.6085
Sensitivity : 0.8986
Specificity : 0.8671
Pos Pred Value : 0.8942
Neg Pred Value : 0.8724
Prevalence : 0.5555
Detection Rate : 0.4992
Detection Prevalence : 0.5582
Balanced Accuracy : 0.8828
'Positive' Class : TRUE
We can see from the CM result, True Positive rate improved for gwLR model.
Visulising gwLR
gwr_sf.fixed_re <- cbind(Osun_wp_sf_selected, gwr.fixed_re)tmap_mode("view")tmap mode set to interactive viewing
prob_T_re <- tm_shape(Osun) +
tm_polygons(alpha = 0.1) +
tm_shape(gwr_sf.fixed_re) +
tm_dots(col = "yhat",
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(8,14))
prob_T_rePerformance Comparison
Table below shows the True positive and True negative comparison of 4 logistic regression models.
Performance Measure |
Generalized linear Regression (all variables) | gwLR Model (all variables) | Generalized linear Regression (exclude 2 var) | gwLR Model (exclude 2 var) |
|---|---|---|---|---|
| TP | 0.7207 | 0.9005 | 0.7188 | 0.8986 |
| TN | 0.6154 | 0.8628 | 0.6149 | 0.8671 |
From above comparison table, the True Positive did not improve after we exclude the two in-significant variables. However, the True Negative of gwLR model improved slightly with exclude the two variables. So we may consider to use gwLR model if we want to focus on analysing non-functional water point ratio. On the other hand, compare the results of GW regression model and generalized linear regression model, it cleared shown that the GW regression model is more robust either in AIC value or the performance measures.